home *** CD-ROM | disk | FTP | other *** search
Text File | 1987-08-05 | 63.5 KB | 1,668 lines |
-
-
-
-
-
-
-
-
-
-
- NARC - A STAND-ALONE DE-ARCHIVE UTILITY
- (no other files required)
-
-
-
-
- Documentation for NARC.EXE
-
-
- Written by Gary Conway
-
-
- Infinity Design Concepts
-
-
- Louisville, Kentucky
-
-
- Copyright (c) 1987
-
-
- Version 1.2
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- NARC.DOC Copyright (c) 1987 Infinity Design Concepts
-
-
-
-
-
-
-
-
-
-
- NARC.EXE is placed in the public domain under the user
- supported shareware concept. NARC.EXE is and will remain the
- property of Gary Conway. This program may not be used in any
- connection with commercial ventures, nor as a sales aid,
- without the expressed written consent of the author. All
- rights are reserved.
-
-
- Infinity Design Concepts
- 1052 Parkway Drive
- Louisville, Kentucky 40217
-
- Member IEEE
- KIPCUG
- PCCL
- KKUG
- NSPE
-
-
- All new releases of NARC.EXE and all other IDC utilities
- can be located -FIRST- on ;
-
- The SoftStone RCPM FOG #24 (supporting CP/M and MSDOS)
- (502)241-4109
- 30 MEG
- 300/1200 baud
- 24 hrs.
- Louisville, Kentucky
-
- Curt Edwards - SYSOP
-
- Sponsored by: Kentucky Kaypro Users Group
- Accounting Computer Systems
- First Osborne Group
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- NARC.DOC Copyright (c) 1987 Infinity Design Concepts
-
-
-
-
-
-
-
-
-
-
- REGISTRATION
-
-
- If you find yourself using NARC, please take the time
- to do the right thing and that is register your copy. You
- have been provided the opportunity to freely test the program
- before even thinking about registering. This is only fair,
- so, in fairness, you should reciprocate and register your
- copy, if you continue using the program. There are several
- ways to register NARC. Registered users will be notified of
- updates to the program.
-
-
- Disk only with current version .................. $20.00
- (includes manual on disk)
- Printed manual .................................. $15.00
- (bound and printed manual)
- Printed manual and disk ......................... $35.00
- Site License ................................... $50.00
- (required for business use)
-
- You will find the registration form in the ARChive with
- this document under the name REGISTER.FRM. Please use this
- form for registration.
-
-
-
- THIS IS NOT A FREE PROGRAM
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- NARC.DOC Copyright (c) 1987 Infinity Design Concepts
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- ═════════════════════════════════════════
- 70% F A S T E R E X T R A C T I O N
- ═════════════════════════════════════════
-
-
-
- There has been a large number of requests for an increase in
- extraction speed. NARC version 1.2 demonstrates a 70%
- increase over previous releases.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- NARC.DOC Copyright (c) 1987 Infinity Design Concepts
-
-
-
-
-
- .............................................................................
- ........................ ............................
- ........................ TABLE OF CONTENTS ............................
- .............................................................................
-
-
- Page
- WHAT IS IT ANYWAY.................................. 1
- ACKNOWLEDGEMENTS................................... 1
- COMPATIBILITY...................................... 2
- Author's Ramblings............................. 2
- ARChive Storage Methods Supported By NARC...... 2
- Packing
- Squeezing
- Crunching
- Squashing
- OVERVIEW........................................... 3
- COMMANDS........................................... 4
- Extract Command................................ 4
- View Command................................... 5
- Print Command.................................. 5
- ARC-wind Command............................... 6
- DRV-wind Command............................... 6
- SUB-wind Command............................... 6
- Quit Command................................... 6
- ALTERNATE COMMANDS................................. 6
- Function keys
- Find Command
- Kill File Command
- Page UP, DOWN, HOME,END
- OPERATING HINTS AND SHORTCUTS...................... 8
- ERROR MESSAGES..................................... 9
- ARCHIVE FILE FORMATS AND GENERAL INFORMATION....... 10
- Packing........................................ 10
- HUFFMAN CODING (SQUEEZING)......................... 11
- CRUNCHING (LZW COMPRESSION)........................ 15
- DETAILS OF STORAGE VERSIONS........................ 17
- ARChive file Header Structure.................. 19
- HASHING............................................ 20
- CRC - calculations............................. 20
- ARC RELEASE DATES AND VERSIONS..................... 21
- NARC REVISION HISTORY.............................. 22
-
-
-
- ........................................................................
- Narc (c) 1987 Infinity Design Concepts all rights reserved
- ........................................................................
-
-
-
-
-
-
-
- NARC.DOC Copyright (c) 1987 Infinity Design Concepts
-
- ═══════════════════
- WHAT IS IT ANYWAY ?
- ═══════════════════
-
-
- NARC is a menu driven de-ARChive facility, written entirely
- in assembler. NARC allows you to easily move from ARC file to
- ARC file, with the option of viewing, printing, extracting or
- deleting the subfiles from the ARChive. The program may be
- operated from the mouse or the keyboard. Menus are of the
- musical popup variety to add a little "TechNoFlash" to the
- proceedings. NARC is the culmination of about six months of
- frustrating effort and 8000 + lines of 8088 source code.
-
- Why....
-
-
- Because I use a lot of ARC files and ARC.EXE and the
- clones are reminiscent of the early Ward Christensen
- CP/M days in user interface etiquette, I wanted something a
- little more flexible and friendly to use. I would like to
- pause here for a second and give a little credit to Mr.
- Christensen ( the Don Garlits of CP/M ) for the fine FREE
- utilities he has given to ALL of us over the years. The next
- time you do a modem transfer, you can thank him for the
- original XMODEM from which all others have transpired.
-
- Why NARC...
-
- It seemed like a good idea. Short for uN-ARC. The idea was
- originally Bob Freed's.
-
- Acknowledgments..
-
- I would like to thank Bob Freed for his allowing me to
- examine his Z80 code before writing NARC. Bob wrote UNARC for
- the CP/M world and is ( as of this writing 4/28/87) working
- on NOAH the ARCing program for CP/M. I would also like to
- thank System Enhancement Associates for releasing their
- source code in "C". Without both of the above, NARC would
- have been a much larger chore than it was. Note also that the
- crunching algorithm used in ARC.EXE was taken from COMPRESS,
- used in UNIX. A special thanks to Curt Edwards, Jerry Taylor,
- Frank Roemer, Paul Bowling and Ken Romines for their "eagle-
- eyes" in locating errors.
-
-
-
-
-
-
-
-
-
-
-
-
-
- NARC.DOC Copyright (c) 1987 Infinity Design Concepts
-
- Page 1
-
- ═════════════
- COMPATIBILITY
- ═════════════
-
-
- NARC is compatible with all known "skrunching" algorithms,
- that is up to and including Squashing. NARC is compatible
- with ARC.EXE version 5.21 and PKxARC. NARC supports Squashing
- which is nothing more than a variation of the crunching
- algorithm, yet it is the easiest (and most logical) of the
- crunching methods to code. I have heard a lot of criticism of
- squashing, but those folks need to get up with the times,
- squashing is (and should be) here to stay. NARC also
- recognizes the .ARK extension soon to be prevalent in the
- CP/M world via Bob Freed's CP/M archive facility, NOAH.
-
-
- Author's Ramblings
-
-
- System Enhancement Associates, I am told is dropping the ball
- as far as ARC.EXE goes. I think that Thom Henderson deserves
- a great round of applause for his contribution and help with
- a formidable problem, namely, storage space (or lack
- thereof).
-
- The oldest version of ARC.EXE that I can find is version
- 3.10, released 5-1-85. This version supports storage methods
- up to and including squeezing (no crunching). If anyone has
- an older version I would be interested in seeing it. Here is
- a list of the versions that I do have and would be interested
- in getting any other versions floating around.
-
- 3.10 4.10 4.50 4.52 5.00 5.10 5.12 5.20
-
-
- ═════════════════════════════════════════
- ARCHIVE STORAGE METHODS SUPPORTED BY NARC
- ═════════════════════════════════════════
-
-
-
- Packing - all versions
- Squeezing - Huffman Coding
- Crunching - all versions (LZW encoding)
- Squashing - one version
-
-
- Note: LZW stands for Lempel-Ziv-Welch
-
-
-
-
-
-
-
-
-
- NARC.DOC Copyright (c) 1987 Infinity Design Concepts
-
- Page 2
-
- OverView...
-
- When NARC is first invoked, it saves the current drive/path
- for use again on exit, so you always end up where you
- started. Some folks like that and some don't. I DO. NARC
- first searches the default path for ARC files and if any are
- found they are displayed in a window on the left side of the
- screen. The arrow keys (or the mouse) may be used to move the
- cursor bar up and down the window, there are three ways to
- select the highlighted ARC file ;
-
- (1) Hit the ENTER key
- (2) Press the left mouse button
- (3) Hit the F1 key.
-
- NOTE: Function keys F1 and F2 mimic the mouse buttons as ;
-
- F1 = left mouse button
- F2 = right mouse button
- F3 = both mouse buttons
-
- NARC will determine whether a monochrome or color monitor is
- being used and act accordingly. EGA is not directly supported
- at this point, because I don't have one and can't test the
- routines.
-
-
- After selecting the sub-file of interest, NARC displays all
- of the ARC sub-files and their statistics on the screen. You
- are also given a menu bar at the bottom of the screen. You
- may use the arrow keys or the mouse to move the cursor bar to
- the desired selection and then select with the F1 key or the
- ENTER key or the left mouse button. As the cursor bar is
- moved, you are also given a brief description of the
- highlighted command. The commands will now be discussed
- individually.
-
-
- Note: You may also select any option from the command bar by
- entering the first letter of the command.
-
- The ESCape key will exit the program from any of the windows
- or the command bar.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- NARC.DOC Copyright (c) 1987 Infinity Design Concepts
-
- Page 3
-
- ════════
- COMMANDS
- ════════
-
-
- ═══════════════
- Extract Command
- ═══════════════
-
- Selecting this option will cause another prompt to be
- displayed, asking whether you wish to extract the highlighted
- file or tagged files. (Files are tagged with the space bar).
- "Point and shoot" here again as before. The ESC key or the
- right mouse button will abort the operation. If the disk
- becomes full, you will be informed and have the option of
- aborting or continuing.
-
-
- Highlighted File
-
- When EXTRACT is selected, you will be asked for a
- drive/path to extract the file to. If you simply hit ENTER or
- the F1 key or the left mouse button, the file will be
- extracted to the default drive/path. You may also enter any
- valid DOS drive/path. The ESC key or the F2 key or the right
- mouse button will abort the operation.
-
-
- Tagged Files
-
- The Space bar (or F3 key) is used to TAG the current file.
- When a file is tagged, an asterisk will be displayed on the
- line with the current file in column 80. As a side note, the
- asterisk was chosen as a reminder of the CP/M days of NSWEEP
- and B29. The space bar will also unTAG a file, thus it is a
- "toggle". When the space bar is pressed, an asterisk will
- appear as described above and the cursor bar will move to
- the next file. When the "TAGGED" option is selected from the
- command line, all files that have been tagged, will be
- extracted to the SAME drive/path.
-
- After the file is extracted, it's date and time are set to
- those contained in the ARC file. The file is also checked for
- size and CRC, if both of these do not match exactly what was
- contained in the ARC file header, then an error has occurred
- and the user is notified The files will also remain tagged
- after the extraction.
-
-
-
-
-
-
-
-
-
-
-
- NARC.DOC Copyright (c) 1987 Infinity Design Concepts
-
- Page 4
-
- ════════════
- View Command
- ════════════
-
- This option will display the currently highlighted file on
- the screen. The space bar is used to pause and the ESC key is
- used to abort. No mouse support here, since it slows things
- down too much.
-
- ═════════════
- Print Command
- ═════════════
-
- The print option will print the currently highlighted file.
- After selecting the print option, you will be asked which
- character pitch you want to print in. Enter the number that
- you wish (or 0 for the default pitch) and the printer will be
- set to that pitch.
-
- NOTE: The printer strings that come installed with NARC are compatible
- with EPSON printer strings. If you wish to install NARC with
- your own strings, see NARCCFG.DOC for complete instructions.
-
- After you have selected the printer pitch, you will be
- shown three more options. Use the arrow keys to move left and
- right. The space bar (or right mouse) is used to toggle the
- options ON or OFF. When you have finished and are ready to
- print, hit ENTER (or left mouse button). The options are;
-
-
- Format - YES - This option causes NARC to format the
- output with page breaks and page numbers.
- NO - NARC does not format the file.
-
- The following two options work independently of the Format option.
-
- Strip High - YES - NARC will strip the high bit off each
- character before it is sent to the
- printer. Some word processors set this
- high bit on some characters as a means of
- text formatting. These characters will
- print as garbage usually.
- NO - NARC will not strip the high bit.
-
- Strip Control - YES - NARC will strip all control characters
- from the file before it is printed. This
- is useful on files that have embedded
- formatting characters, and you wish to
- have NARC provide the formatting.
- NO - NARC will not strip the control chars.
-
-
-
-
-
-
-
-
- NARC.DOC Copyright (c) 1987 Infinity Design Concepts
-
- Page 5
-
- ════════════════
- ARC-wind Command Note: The right mouse button will also pop
- ════════════════ this window up.
-
- This option will display the window containing all of the
- ARC files in the current sub-directory. Move the cursor bar
- up and down with the mouse or arrow keys and select with the
- F1 key or ENTER key or left mouse button.
-
- ════════════════
- DRV-wind Command
- ════════════════
-
- This option will pop up a window that contains all of the
- logical drives that DOS reports to NARC. Select as before and
- the ARC-window will be popped up so that you can then choose
- an ARC file to examine.
-
- ════════════════
- SUB-wind Command
- ════════════════
-
- This option will pop up a window that contains all of the
- sub-directories in the current directory. Point and shoot as
- before. After making your selection, the window is
- automatically popped up again showing the sub-directories in
- the new current directory. When you get to the directory that
- you want, hit the F2 key or the right mouse button and the
- window will be put away and the ARC-window will be popped up
- so that you can then select an ARC file.
-
- ════════════
- Quit Command
- ════════════
-
- I hope I don't need to describe this one.
- NOTE: The ESCape key will also exit NARC.
-
- ══════════════════
- ALTERNATE COMMANDS
- ══════════════════
-
- The extra commands can be located on the help screen, which
- is invoked by the F10 key.
-
- F1 - key
- Will select the highlighted option.
-
- F2 - key
- This keys function varies with the window that is on the
- screen at any one time. When an ARC file is opened and the
- subfile list is onscreen, this key will pop up the ARC file
- window again. Any other use of this key is given at the
- appropriate time on the screen.
-
-
-
-
- NARC.DOC Copyright (c) 1987 Infinity Design Concepts
-
- Page 6
-
- F3 - key
- This key will tag a subfile and move the cursor bar on to
- the next subfile. This key also has other functions, and they
- are also shown on the screen when necessary.
-
- F4 - key
- The F4 key will print an image of the screen less the
- advertisement and command lines.
-
- F5 - key
- Invokes the NARC-DOS command processor. You may then enter
- any valid DOS command. When finished, simply hit ENTER
- by itself and you will be returned to NARC. You may also
- enter "COMMAND" which will invoke a second copy of
- COMMAND.COM, if the file COMMAND.COM is in your search path.
- To return to NARC, you would then type "EXIT".
-
- F6 - key
- This key will tag all of the subfiles in the ARChive for
- subsequent extraction.
-
- F7 - key
- This key will invert all of the tags on the subfiles, that
- is all files that were tagged will become untagged and vice-
- versa.
-
- F10 - key
- This key will display the NARC help screen with all of the
- alternate commands listed and a brief description of their
- functions.
-
- (F)ind command.
- Will prompt for a wildcard filename to find in the sub-file
- list. Any number of characters may be used, for example, you
- may enter a single character and NARC will find the first
- file whose name begins with that character. Alternatively,
- you may enter a complete wildcard specification and NARC will
- attempt to find a match.
-
- (K)ill file command.
- Will remove the currently highlighted sub-file from the
- ARChive. No additional disk space is required for temporary
- files.
-
- PgUP,PgDN,Home and End
- These keys do what you might expect. They only work when
- you have an ARC file opened and the subfiles are displayed on
- the screen. These functions are nice for large ARC files.
-
-
-
-
-
-
-
-
-
-
- NARC.DOC Copyright (c) 1987 Infinity Design Concepts
-
- Page 7
-
-
- Operating Hints and Philosophy and Shortcuts
-
-
-
- When NARC is first invoked, it pops up the window showing
- all of the ARC files in the current directory. The first
- thing that you must always do is to select an ARC file. The
- reason for this is that when I want to look inside an ARC
- file, I will move to that drive/directory and then call
- NARC. This saves me from having to select where I want to
- look and what drive and all that mess. This way, when the
- program comes up, I can go right to work. If you run NARC
- and then decide to change drives or directories,you MUST
- first select an ARC file from the first window.
-
- When there are no windows popped up, the right mouse
- button (or F2 key) will pop up the ARC file window,
- regardless of where the command line cursor bar is. This is
- handy when you have a lot of ARC's that you want to thumb
- through, with just the 2 mouse buttons or F1 and F2 keys,
- you can look through a whole directory of ARC files in
- nothing flat ! Also along these lines, when the ARC window
- is on the screen, the right mouse button or the ESCape key
- will exit the program.
-
-
- The control system is a little tricky at first,
- but I think you will like it once you get used to it.
-
-
-
- NARC buffers 64k of the input file at one time, thus
- speeding up file operations. The output buffer is 32k and
- should be larger in the next version. NARC requires about
- 170K of RAM to operate.
-
-
- (This prime advertising space for rent)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- NARC.DOC Copyright (c) 1987 Infinity Design Concepts
-
- Page 8
-
- ══════════════════════
- Error Messages.
- ══════════════════════
- Memory Allocation Error.
- - NARC allocates memory when it is invoked, this says that DOS told
- NARC that there was not enough memory left to run the program
-
- ERROR: Extraction Failed due to CRC error, Hit ENTER
- - After a file is extracted, the CRC contained in the ARC header is
- compared to the CRC that NARC calculates, this message says that
- the two were different. This is the CRC-16 polynomial.
-
- ERROR: Extraction Failed due to FileSize error
- - Same as above, except with filesize
-
- ERROR: Disk File Inconsistency. Hit ENTER
- - This will usually mean that the user has swapped disks just after
- telling NARC to View,Print or Extract and NARC does not recognize
- the file.
-
- ERROR: Incompatible Crunch Format
- - Says that either the stowage code for this file is not supported by
- NARC -OR- there is an error in the ARC header
-
- ERROR: Extraction Failed due to Lack of Disk Space - (A)bort (C)ontinue
- - Abort will stop tagged extraction, continue will try to fit next
- file.
-
- Squeezed File Has a Diseased Decode Tree.
- - When unsqueezing a file, NARC has found a bad entry in the decode
- table.
-
- ERROR: No directory space on destination!
- - Self explanatory
-
- Bad Path Name, Hit ENTER
- - The destination path that the user entered for extraction is not
- a valid DOS pathname, re-enter.
-
- Requires DOS version 2.0 or above.
- - NARC requires DOS 2.0 or above to operate.
-
- Invalid archive file format
- - NARC could not find any ARC headers, this is probably not an ARC file
-
- Warning: Bad archive file header, bytes skipped = xxxxx
- - If an entry has a bad header, NARC will examine the next 64k bytes
- looking for a good header. This is to maintain compatibility with
- ARC v.5.20 which allows self-unpacking ARC files.
-
- Unexpected end of ARChive file
- - Says that NARC couldn't find the last ARC header
-
- No matching file(s) in ARChive
- - ARC file is empty
-
-
-
- NARC.DOC Copyright (c) 1987 Infinity Design Concepts
-
- Page 9
-
- ════════════════════════════════════════════
- ARCHIVE FILE FORMATS AND GENERAL INFORMATION
- ════════════════════════════════════════════
-
-
- For Those With a Little More Curiosity...
-
-
-
- The following are the currently supported stowage methods.
-
- 1 unpacked (obsolete)
- 2 unpacked
- 3 packed
- 4 squeezed (after packing)
- 5 crunched (obsolete)
- 6 crunched (after packing) (obsolete)
- 7 crunched (after packing, using faster hash algorithm)
- 8 crunched (after packing, using dynamic LZW variations)
- 9 Squashed c/o Phil Katz (no packing) (var. on crunching)
-
- NOTE: LZW is Lempel-Ziv-Welch crunching algorithm
-
-
-
- A little about the stowage methods.
-
- Packing -
-
- This is the simplest of the storage methods. Suppose that
- you have a line of text and at the end of the line, you
- have 40 spaces. These 40 spaces are compressed into 3 bytes
- in the ARC file. The first byte is the actual character to
- be expanded (in our case a space). The second byte is a
- special "flag" byte that indicates that we need to
- expand these bytes. The third byte is the count byte (in our
- case it would be 40). So you can see that any time the ARC'er
- finds repeated bytes like this, it can compress them into 3
- bytes. The anomalous case to watch out for here is when the
- count byte is the same character as the "flag" byte, this
- proved to be a difficult roach to kill !
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- NARC.DOC Copyright (c) 1987 Infinity Design Concepts
-
- Page 10
-
- ══════════════════════════
- HUFFMAN CODING (SQUEEZING)
- ══════════════════════════
-
-
-
- It does, at first, seem that making a file smaller would be
- an impossible task. I will make an attempt here to shed a
- little light on this subject since that is a question that I
- hear pretty frequently and it is not a two minute discussion
- question. It does require some thought.
-
- To compress a file with the Huffman algorithm, commonly
- called squeezing, the first thing that must be done is
- to read the file completely and count the occurrences of
- each character. That is you count the "A" 's and the "B" 's
- and so forth. There are 256 characters in the extended
- ASCII character set, of which approximately 90 are
- "printable", that is you can see them on the screen. The
- IBM set has more "printables", but that is of no
- consequence, since the squeezer deals only with the numbers
- and doesn't care whether or not the file is an ASCII text
- file or an EXE file. Once the squeezer has counted the
- occurrences of each character, thus the frequency of
- occurrence, it scans the table for the characters that
- appear the least number of times and forms an imaginary
- link between them, called a node. Somewhere else in the
- tree, we will later develop a pointer that points to this
- node. When you start putting all of these things
- together, you will form a binary tree in memory. Confused
- enough ? Let us try an example.
-
- We have a file that is 100 bytes long and has 6
- different characters in it. We have counted the
- occurrence of each of the characters and found the following.
-
-
- quantity character
-
- 5 - D
- 10 - A
- 10 - F
- 20 - B
- 25 - E
- 30 - C
-
- The spelling in the file wasn't very good, but we don't
- care. Now we take these numbers and will call them the
- frequency of each character. We then arrange the table as
- below. This is an arbitrary arrangement, but it is useful
- here so as to make our tree readable on the screen. The
- arrangement makes no difference.
-
-
-
-
-
-
- NARC.DOC Copyright (c) 1987 Infinity Design Concepts
-
- Page 11
-
- Frequency 20 10 5 10 30 25
-
- Character B A D F C E
-
-
- We then examine the table to find the two characters with
- the smallest frequency of occurrence. In our case, it is
- obvious that one of them is 5,but which 10 do we choose. As
- it turns out, it doesn't matter which one you choose, we will
- arbitrarily choose the F. We draw lines from the D and the F
- to form our node (the box below).
-
-
- Frequency 30 10 5 10 20 25
-
- Character C A D F B E
- \ /
- \ /
- ╔══╗
- ║15║ = 5 + 10
- ╚══╝
-
-
- The number in the box is the sum of the frequencies of
- the D and F characters. Now we again look for the lowest
- two frequencies, except, this time we do not consider the D
- and F characters individually, we instead consider the node.
- The lowest two now are the A and the node, that is 10 and
- 15. We again do some artwork.
-
-
- Frequency 30 10 5 10 20 25
-
- Character C A D F B E
- \ \ /
- \ \ /
- \ ╔══╗
- \ ║15║ = 5 + 10
- \ ╚══╝
- \ /
- \ /
- ╔══╗
- ║25║ = 10 + 15
- ╚══╝
-
-
- We look at the table again for the next two lowest
- frequencies and now find B and E . We continue in this
- fashion until the entire "tree" is built, that is until it
- all condenses to one node. The leaves are the actual
- characters at the top of the tree and the nodes represent
- branch joints with the root at the bottom.
-
-
-
-
-
-
- NARC.DOC Copyright (c) 1987 Infinity Design Concepts
-
- Page 12
-
- Frequency 30 10 5 10 20 25
-
- Character C A D F B E
- \ \ \ / \ /
- \ \ \ / \ /
- \ \ ╔══╗ ╔══╗
- \ \ ║15║ ║45║
- \ \ ╚══╝ ╚══╝
- \ \ / /
- \ \ / /
- \ ╔══╗ /
- \ ║25║ /
- \ ╚══╝ /
- \ / /
- \ / /
- ╔══╗ /
- ║55║ /
- ╚══╝ /
- \ /
- \ /
- ╔════╗
- ║ROOT║
- ╚════╝
-
-
- Now that our tree is made up, we can encode the file. We
- start at the root (always). To encode the first character
- (leaf) of the tree (the letter C), we trace up the tree until
- we hit the letter C at the top. Along our journey, if we
- make a left turn, we record a 0 bit, and a 1 for a right
- turn. So for the C, we would go left to 55 (and record a 0)
- and then left again to the letter C (and record another 0),so
- the Huffman code for our letter C is 00. For A we go left to
- 55, right to 25 and left to A and it becomes 010. By doing
- all of the letters this way, we find the following.
-
-
- C = 00 ( 2 bits )
- A = 010 ( 3 bits )
- D = 0110 ( 4 bits )
- F = 0111 ( 4 bits )
- B = 10 ( 2 bits )
- E = 11 ( 2 bits )
-
- Mind that the zeroes and ones above are bits and not
- bytes. Each character was represented in the original
- file by 8 bits (one byte) and since we have reduced the
- number of bits needed to represent each character, we
- therefore reduce the size of the file. The savings add up as
- follows,
-
-
-
-
-
-
-
-
- NARC.DOC Copyright (c) 1987 Infinity Design Concepts
-
- Page 13
-
- character frequency original bits squeezed bits savings
-
- C 30 30 x 8 = 240 30 x 2 = 60 180
- A 10 10 x 8 = 80 10 x 3 = 30 50
- D 5 5 x 8 = 40 5 x 4 = 20 20
- F 10 10 x 8 = 80 10 x 4 = 40 40
- B 20 20 x 8 = 160 20 x 2 = 40 120
- E 25 25 x 8 = 200 25 x 2 = 50 150
- ══════════ ══════ ═════ ═════
- Totals 100 800 240 560
- │ │
- original file size ──────┘ │
- squeezed file size ───────────────────────┘
-
-
- 240 is 30% of 800, so we have compressed this file by
- 70%. Golly Wally, that seems pretty good. The rub lies in the
- fact that in order to reconstruct the original file, we
- must have access to the decode tree and since each tree
- will be different for each file, we must therefore save the
- tree with the file.It turns out that the tree can have only
- 256 nodes in a bytewise compression technique and each node
- will hold 4 bytes as pointers,a full table will be about 1k
- long. The table in our example has 5 nodes plus the 6 leaf
- nodes (where our characters are), totaling 11. 4 times 11
- is 44 and if we add a few bytes for storing the node count
- and some other statistics, our table is about 50 bytes long.
- If we look at the 240 in the above table this gives the total
- number of bits that it will take to encode the file, divide
- 240 by 8 to get the number of bytes (30) and add it to our
- 50, we get a compressed file size of 80 bytes. Since our
- original file was 100 bytes, we have achieved a 20% reduction
- in file size. Not bad. What we have really accomplished is a
- translation of character sets, with our new set requiring
- less space than the original ASCII set.
-
- How far can we go ?
-
- If we look at the maximums that we can obtain for the
- different bit combinations in a optimally skewed tree, that
- is a tree that is not exactly symmetrical, we find that we
- can have only 4 - 2 bit codes, 8 - 3 bit codes, 16 - 4 bit
- codes, 32 - 5 bit codes, 64 - 6 bit codes, 128 - 7 bit codes,
- the remaining 4 will be 8 bit codes.
-
-
- 2 - 1 bit codes
- 4 - 2 bit codes
- 8 - 3 bit codes
- 16 - 4 bit codes
- 32 - 5 bit codes
- 64 - 6 bit codes
- 128 - 7 bit codes
- --------
- 254
-
-
-
- NARC.DOC Copyright (c) 1987 Infinity Design Concepts
-
- Page 14
-
- And since we have a total of 256 different bytes to
- encode, the remaining 2 characters must have 8 bit codes. If
- we add the number of bits that this represents,we find a
- total of 1554 bits or 195 bytes. So at maximum, we have
- compressed the 256 bytes to 195 or 33%, thus the idealistic
- maximum that can be achieved with the Huffman algorithm is
- 33% when using a byte level implementation.
-
- One final note; The Huffman scheme requires the input file
- to be read twice, once to count characters and frequencies
- and then again to do the actual encoding. The major
- differences in Huffman coding and crunching lie in the fact
- that crunching is a one pass operation and does not require
- the table to be stored with the file. Both, however, are
- extremely vulnerable to errors, for example, imagine what
- would happen if you skipped one bit when squeezing the file,
- all of the remaining characters in the file would become the
- proverbial garbage, since we are looking at the file on a bit
- level.
-
-
- NARC uses the method described in K. & R. pp. 130 for
- setting up the binary tree with several modifications. The
- simple binary tree is acceptable for this, since the tree
- never grows and therefore will never become unbalanced.
-
- If you followed that, now go back and read the second
- paragraph again, maybe it will make sense this time.
-
- ══════════════════════
- CRUNCHING
- ══════════════════════
-
-
- Crunching began with an article by J. Ziv and A. Lempel
- in IEEE Trans. Information Theory, May 1977, where the
- method was originally described. Terry A. Welch wrote a
- definitive application article in IEEE Computer, June 1984
- which described in detail how to apply the algorithm and
- some common problems encountered. Thus the name LZW
- compression.
-
- Crunching takes the Huffman coding method a step
- further as it does not include a table with the crunched
- file. The crunching algorithm also "learns" as it proceeds
- through the file. If it finds repeated strings in the file,
- they will be encoded into a table. This table is s et up (in
- NARC's implementation) as a 4096 by 3 table. Each entry
- is formatted as <PREF>,<SUFFIX>, where PREF is a 2 byte
- pointer to another entry. SUFFIX is the byte for this
- entry. Representing the PREF's as pointers rather than values
- speeds up most operations in NARC. This idea came from Bob
- Freed and is very trick.
-
-
-
-
-
- NARC.DOC Copyright (c) 1987 Infinity Design Concepts
-
- Page 15
-
- One obvious benefit of crunched files is the fact that
- there is no need to include the encoding table in the
- compressed file as was the case with squeezing. Another great
- benefit is the fact that crunching is a one pass operation
- as opposed to the two pass situation in squeezed files.
-
- Crunching begins by creating an "atomic" table, that is
- a table in RAM that contains 256 entries, one for each
- character in the extended ASCII set. The values range
- sequentially from 0 to 255. The table entries are arranged as
- follows.
-
- Prefix Pointer (2 bytes) and Suffix byte (1 byte)
-
- In this initial table setup, the Suffix bytes are the 0
- through 255 mentioned above. These are the "atomic"
- characters, in that they must be in the table before
- crunching or uncrunching can begin, since all files contain
- some portion of this character set. We do not know which
- characters will be included in any given file and which ones
- will be excluded,so we must include them all in our initial
- table. Once this table is set up, we can begin crunching.
-
- The Prefix pointer will contain a value that is a pointer
- to another string. When the table is initially set up, there
- are no other strings, so this Prefix pointer is set to a
- special "Null" string, that is it points nowhere.We must be
- careful when crunching the file, to look for this special
- pointer and act accordingly when we encounter it.
-
- This Prefix and Suffix business is used to "build"
- long strings. If we read the input file and found that the
- first character was the letter "I", we would look this
- letter up in the string table by hashing (computing an
- address). If we found the letter in the table (which we
- certainly will on the first character), then we output it's
- "hashed" address as a code to the output file (the
- crunched file). Suppose then, that the next character
- input from the file was the letter "D", the cruncher would
- then look at the I and the D together to see if they exist
- as a string in the table. Well of course, since this is the
- second character of the file, we know that it doesn't, so
- the cruncher forms a new entry in the string table. This
- entry has as its' Prefix pointer, a value that "points" to
- the letter "I" entry in the table, that we made a minute
- ago. The suffix byte in this case would be the letter
- "D". Now another code is output to the crunched file,
- representing the letter "D". Well this is great, we have
- now turned 2 bytes from the input file (16 bits) into 3 bytes
- in the output file (24 bits). You are correct, crunching
- begins by "not crunching" , but it is a crazy world ! The
- real value becomes apparent when we run into this same
- sequence "ID" in the input file again. Now we will find an
- entry for it in the string table and we can output a single
- 12 bit code that stands for "ID", thus saving 4 bits. The
- cruncher continues "learning" strings like this until the
- file is crunched. It should be noted that the string table
-
- NARC.DOC Copyright (c) 1987 Infinity Design Concepts
-
- Page 16
-
- is dynamically changing as the input file is processed.
-
- The early versions of crunching implemented, stopped
- "learning" once the string table was full. This gave a very
- poor compression ratio in some files. Versions 8 and 9 have
- an additional feature called adaptive reset, where the string
- table is cleared and crunching begins all over again ! This
- capability really helps the larger files more than smaller
- files.
-
-
-
-
- Details of Storage Versions
-
-
- NARC supports all of the current "skrunching" algorithms. A
- brief description of each follows.
-
- Version 1
-
- - "STORED" File is simply stored (obsolete now, 25 byte
- header)
-
- NOTE: Files stored with this version are rare.
-
- Version 2
-
- - "STORED" Current version of simple storage. This
- version was implemented with the implementation of file
- compression. The main difference in version 1 and 2 is
- the ARC header (see header section below), version 1
- has a header length 4 bytes smaller than any of the rest
- of the storage methods since in version 1 there was no
- need to store the original file length separately from
- the stored file length since they were the same.
-
- Version 3
-
- - "PACKED" Repeated bytes are packed into a three byte
- string (see Packing above)
-
- Version 4
-
- - "SQUEEZED" after packing. The file is first packed as
- described above and then squeezed
-
- Version 5
-
- - "CRUNCHED" This is the first implementation of LZW
- released. Uses fixed length codes and a complex hashing
- function. (obsolete now) (See hashing below)
-
- NOTE: Files compressed with this version are hard to find.
- Version was released only one month when next version
- came out.
-
-
- NARC.DOC Copyright (c) 1987 Infinity Design Concepts
-
- Page 17
-
- Version 6
-
- - "CRUNCHED" after packing. The file is first packed and
- then crunched. Uses fixed length codes and the same
- hashing function as version 5.
-
- Version 7
-
- - "CRUNCHED" after packing. Same as version 6 except a
- faster hashing function is used.
-
- NOTE: Thom Henderson (author of ARC) has this to say about
- version 7. "This approach was abandoned because dynamic
- Lempel-Ziv works as well, and packs smaller also. However
- inadvertent release of a developmental copy forces us to
- leave this in."
-
-
- Version 8
-
- - "CRUNCHED" after packing. Uses variable length codes
- in the crunched file (9 to 12 bits) and has a faster
- hash function (actually not hashing at all, but for the
- sake of consistency, we will call it that). This version
- also resets the string table when it becomes full which
- benefits the compression ratio of larger files. This
- resetting is commonly called an adaptive reset.
-
-
- Version 9
-
- - "SQUASHING" (variation on crunching scheme). This
- version uses the same hashing function as version 8 but
- varies the crunching codes from 9 to 13 bits. There is
- also no packing, which affords the string table the
- opportunity to "learn" longer codes and thus improve the
- compression ratio (benefits "real large" files).
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- NARC.DOC Copyright (c) 1987 Infinity Design Concepts
-
- Page 18
-
-
-
- ARC file header structure is same for both DOS and CP/M
-
- Byte number Value(s) Meaning
- ══════════════════════════════════════════════════════════════════════════
- 1 1Ah Header Flag
- 2 0-9 Compression Version
- 3-15 --- ASCIIZ compressed filename
- 16-19 --- Compressed file size in bytes
- Low Word, High Word with each word
- in LoHi format
- 20-21 bits DOS date format
- 15-9 Year
- 8-5 Month
- 4-0 Day (all zeroes means no date)
- 22-23 bits DOS time format
- 15-11 Hours (military)
- 10-5 Minutes
- 4-0 Seconds
- 24-25 --- CRC-16 in LoHi format of uncompressed
- file. ------- This is important.
- 26-29 --- Original uncompressed file size
- NOTE: Version 1 files are not compressed
- so the length can be found with
- bytes 16-19, also the header len
- for version 1 files is 25 bytes.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- NARC.DOC Copyright (c) 1987 Infinity Design Concepts
-
- Page 19
-
- Hashing.....
-
- Hashing is simply an arithmetic way of coming up with
- an address in a table. You have a set of input numbers
- (codes) that will map one-to-one with the output codes in
- an ideal situation. That is, each time you input a certain
- number, you can rest assured that the output will always
- return the same output number. This is not quite the case in
- the current versions of .ARC files.The reason is that the
- algorithm would require a MEG or so of RAM for
- implementation. The utilization of a smaller string table in
- all of the ARC programs introduces the possibility of
- producing the same output number for 2 or more input
- numbers. This is called a hash collision. This is handled
- separately in .ARC files with what is called a "collision
- table", which helps to locate the correct table entry.
-
-
- There are three versions of "hashing" used in ARC files.
-
- Hash1 - Key = upper 12 bits of lower 18 bits of unsigned square of
- (prefix code + suffix byte) OR 800h
-
- Used in stowage versions 5 and 6
-
- Hash2 - Key = lower 12 bits of unsigned product of
- (prefix code + suffix byte) * 15073
-
- Used in stowage version 7
-
- Hash3 - Key = next available address in table.
-
- Used in stowage versions 8 and 9
-
-
-
-
- CRC calculations -
-
- NARC does not use the traditional table lookup method
- for calculating the CRC of the file. The table approach
- requires the table to be in RAM and takes up more space.
- NARC calculates the CRC on the fly, which requires no table
- and saves space. The algorithm is taken from the definitive
- article by Aram Perez in IEEE Micro, June '83. The
- polynomial is X^16 + X^15 + X^2 + X^1 which is not compatible
- with the Xmodem CRC.
-
-
- Gary Conway
-
-
-
-
-
-
-
-
- NARC.DOC Copyright (c) 1987 Infinity Design Concepts
-
- Page 20
-
- ══════════════════════════════
- ARC RELEASE DATES AND VERSIONS
- ══════════════════════════════
-
-
-
- These are the various versions of ARC.EXE that I have and
- what versions of storage they supported. PKxARC supports all
- of these methods as well since they were all originally
- created by ARC.EXE.
-
-
- Date Stowage Methods
- Released Version Supported
-
- 05-01-85 3.10 Storing,Packing,Squeezing (1-4)
- ( version 5 in here somewhere)
- 06-26-85 4.10 Up to version 6 of crunching
- 11-18-85 4.50 Up to version 6 of crunching
- 12-04-85 4.52 Up to version 6 of crunching
- ( version 7 in here somewhere)
- 01-21-86 5.00 Up to version 8 of crunching
- 01-31-86 5.10 Up to version 8 of crunching
- 02-05-86 5.12 Up to version 8 of crunching
- 10-24-86 5.20 Up to version 8 of crunching
-
-
- This list is compiled in an attempt to start some kind of
- historical record as to what transpired in the ARC world. I
- would be interested in hearing of additions.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- NARC.DOC Copyright (c) 1987 Infinity Design Concepts
-
- Page 21
-
- ═════════════════════════
- NARC.EXE REVISION HISTORY
- ═════════════════════════
-
- Version 1.0 - First Release 6.10.87 (Beta Release)
-
- Version 1.1 - Released 6.16.87 (Beta Release)
- - Fixed bug in tagged extraction to other than default path
- - Deletes partial file when extraction fails due to lack of disk space
- - Fixed bug dealing with improper end of file condition in HEX view
- - Stronger ARChive header checking. Handles special case where two
- 1Ah bytes are encountered after a bad header
- - Fixed bug with PK zero length files
- - Introduction of NARCCFG.EXE, replacing NARCLOAD.EXE and NARC.CFG
-
- Version 1.2 - Released 8.10.87 (First "real" release)
- - Fixed bug in ARC window when no ARC files. NARC wants F1,F2 or F3
- but bug caused a wrong keypress to send "twirping" sound to spkr.
- Program seemed to be locked up, but ALT key would continue.
- - added PAGE UP, PAGE DOWN, HOME and END functions (great for big ARCs)
- - corrected bug when using some of the single letter commands in
- extracting (this was a stupid mistake on my part)
- - added FIND command for locating sub-files (great for big ARCs)
- - added file deletion
- - added abort option to tagged extraction when disk full
- - added display of number of tagged files and tagged bytes
- - added formatted display of tagged file and tagged bytes
- - changed write color in Blank routine
- - enhanced extract and print abort
- - check disk space BEFORE extraction
- - added help facility
- - 70% FASTER EXTRACTION !
- - added code to check for seek past EOF in when bad ARC file has
- enormous numbers for file sizes
- - fixed ELUSIVE bug in repeated byte expansion when the count byte
- was the REP byte
- - added Tag All option (F6 key)
- - added invert all tags option (F7 key)
- - added sound toggle to NARCCFG.EXE (turns sound on and off)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- End of NARC.DOC Copyright (c) 1987 Infinity Design Concepts
-
- Page 22